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PREFACE 


This  report  was  prepared  by  Professor  Shin-yi  Hsu  of  the  State  Univer- 
sity of  New  York,  Binghamton,  New  York  in  partial  fulfillment  of  contract 
F30602-76-C-0211 , for  the  Rome  Air  Development  Center,  Griff iss  AFB,  New 
York.  The  work  incorporated  in  this  task  consisted  of  texture-tone  analysis, 
software  development,  analysis  of  digitized  black  and  white  aerial  photo- 
graphy, and  estimation  of  stable  parameters  of  the  texture  variables.  The 
project  was  carried  out  using  both  the  DICIFER  Image  Processing  System  of 
the  RADC  Image  Processing  Facility  and  SUNY-Binghamton  Image  Data  Pro- 
cessing System. 

The  work  described  in  this  report  was  performed  by  Dr.  Shin-yl  Hsu, 
Principal  Investigator,  Dr.  Eugene  Kllmko,  Faculty  Associate,  and 
Graduate  Assistants. 

This  study  was  performed  during  the  period  April,  1976  through  May, 

1977.  Capt.  Gregory  B.  Pavlin  and  Lt.  Cyril  Speyrer  were  the  RADC 
Project  Monitors. 


1/il 


LIST  OF  TABLES 


Table  Page 

3-1  The  Texture  Tone  Variables  of  Model  I 8 

3-2  Additional  Variables  in  Model  II  9 

5-1  The  Data  Set 

5- 2  Hit  Rates 

6- 1  Values  of  a for  VPHA 37 


ill 


TABLE  OF  CONTENTS 


Section 

1. 

2. 

3. 


i 

i 

I 

1 


I 

I 


3.1 

3.1.1 

3.1.2 

3.1.3 

3.2 

3.3 

4. 

4.1 

4.2 

4.3 

4.4 

4.4.1 

4.4.2 

4.4.3 

5. 

5.1 

5.2 

5.3 

5.4 

5.5 

5.5.1 

5.5.2 

5.5.3 

5.5.4 

5.5.5 

5.5.6 

5.5.7 


Introduction  and  Summary  

Background  

Texture  Feature  Extraction — A Review  and  a New 
Measure 

A Literature  Review  

Background  

The  Harallck  Measure  

Mitchell's  Max-Mln  Descriptor 

A New  Measurement 

Comparisons  Among  the  Three  Measurements  . . . 

The  Development  of  the  Mahalanobls  Classifier. 

Background  

The  General  Classification  Principle  

Different  Approaches  

The  Development  of  the  Mahalanobls  Classifier. 

Classification  Rules  

Separation  of  Classes 

Ordination  Procedures  

The  Analysis  

The  Data  Set  

The  Hardware  Facility  

The  Software  System 

Generation  of  a Decision  Map  

Hit-Rate  Analysis  

GALA 

SBLA 

VPLA 

URLA 

SBHA 

VPHA 

GAHA  and  URHA  


Page 

1 

3 

5 

5 

5 

6 
6 
7 
9 

11 

11 

11 

12 

12 

14 

15 

15 

16 
16 
17 

17 

18 
18 
20 
20 
20 
21 
21 
21 
21 


iv 


TABLE  OF  CONTENTS  (Continued) 


Section  Page 

A General  Comment  on  the  Decision  Map  Making 22 

6.  Estimation  of  Stable  Parameters  33 

6.1  Characteristics  of  Stable  Distributions 33 

6.2  Estimation  Methods 34 

6.3  Preliminary  Results-Non  Normal  Behavior  of  the 

Texture  Variables 37 

References R-1 


V 


LIST  OF  FIGURES 


figure  Page 

GALA— ORIGINAL  DIGITIZED  PHOTO 23 

GAHA— ORIGINAL  DIGITIZED  PHOTO 23 

VPLA— ORIGINAL  DIGITIZED  PHOTO  24 

VPHA— ORIGINAL  DIGITIZED  PHOTO 24 

SBLA— ORIGINAL  DIGITIZED  PHOTO  25 

SBHA— ORIGINAL  DIGITIZED  PHOTO  25 

URLA— ORIGINAL  DIGITIZED  PHOTO 26 

URHA— ORIGINAL  DIGITIZED  PHOTO 26 

COLOR  CODES  FOR  THE  DECISION  MAP 27 

DECISION  MAP:  GALA 28 

DECISION  MAP:  VPLA 29  ' 

DECISION  MAP:  SBLA 30 

DECISION  MAP:  31 


EVALUATION 


This  final  report  covers  texture  feature  extraction  by  means  of  measuring 
the  spatial  distribution  of  TONES  of  the  pixels  of  a given  area.  Both  1st 
order  and  2nd  statistics  are  used.  This  effort  is  under  TPO  Thrust  R2D  pre- 
cision targeting.  This  effort  represents  a fine  tuning  of  feature  extraction 
and  image  classification  that  will  be  used  in  applications  to  the  Automatic 
Feature  Extraction  System  (AFES)  being  developed  at  RADC. 


DONALD  A.  BUSH 
Project  Engineer 
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SECTION  1 


INTRODUCTION  AND  SUMMARY 

Current  image  processing  capability  at  RADC  employs  tonal,  spatial,  and 
limited  texture  feature  extractor.  To  fill  the  demand  for  using  a powerful 
feature  extractor  for  real-time  object  cuing  systems,  matching  pairs  of 
sensed  and  reference  map  systems,  this  study  is  conducted  with  two  specific 
objectives:  1)  to  develop  and  implement  software  of  texture-tone  feature 

extraction  algorithms,  and  2)  to  evaluate  these  algorithms'  potential  for  ob- 
ject identification  and  terrain  classif Icat? on  using  digitized  photographic 
data  set.  This  effort  will  also  provide  additional  support  to  a current  RADC 
program  with  AfATL  in  the  semi-automatic  classification  of  ten  terrain  types 
with  black/white  high  altitude  photographs. 

During  the  course  of  this  study,  Rome  Air  Development  Center  (RADC)  has 

provided  digitized  image  data,  the  DICIFER  (Digital  Interactive  Complex  for  i 

Image  Feature  Extraction  and  Recognition)  system  for  selecting  training  sets, 

and  the  Color  Printer  for  generating  color  decision  maps.  The  major  task  of 

image  data  processing  was  conducted  at  SUNY-Binghamton  with  the  following 

programs  developed  for  this  effort:  1)  texture  analysis  using  a (n  x n) 

window  size  to  generate  17  to  23  texture-tone  variables  for  each  pixel;  2) 

2 

the  Mahalanobis  D logic  for  classifying  pixels  into  one  of  the  training  sets 
or  a reject  category,  with  a generalize-lnverse  scheme;  3)  a step-wise  dis- 
criminant analysis  to  select  the  significant  texture  variables  for  the  classi- 
fier, with  a confusion  matrix  indicating  the  hit-rate  of  the  training  set 
data;  4)  generation  of  a numerical  classification  results  according  to  a 
(10  X 10)  cells  for  a hit-rate  analysis;  5)  generation  of  the  decision  maps 
using  IBM  370  system;  6)  manual  selection  of  training  sets;  and  7)  pre- 
processing capabilities  using  principal  component  analysis,  and  factor  analy- 
sis . 

Eight  scenes,  four  of  low  altitude  and  four  of  high  altitude  photographs, 
from  the  RADC  Northeast  Test  Area  (NETA)  were  used  to  evaluate  the  potential 
of  the  developed  texture  analysis  algorithms.  Terrain  types  being  mapped  in- 
clude metal,  pavement,  soil,  clutivated  field,  vegetation,  water,  and  com- 
position— a mixture  of  several  categories  such  as  urbanized  area.  In  some 
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instances,  sub-categories  are  used  in  the  training  sets  such  as  two  types  of 
pavement,  cultivated  field  and  vegetation,  respectively. 

The  results  Indicate  that  the  developed  texture  feature  extractor  to- 
gether with  the  Mahalanobis  classifier  is  capable  of  discriminating  the 
specified  terrain  types  at  a high  degree  of  accuracy — a hit-rate  of  approxi- 
mately 90%  has  been  obtained  using  properly  digitized  photographic  data.  It 
is  also  believed  that  the  hit-rate  can  still  be  Improved  by  employing  a new 
classifier  which  can  take  into  consideration  the  skewness  property  of  the 
image  data  since  about  50%  of  the  texture  variables  are  not  normally  distrib- 
uted . 

The  main  body  of  this  report  will  Include  a literature  review  of  tex- 
ture analysis,  the  Mahalanobis  classifier,  and  the  analysis  of  the  eight 
scenes.  Preliminary  investigation  of  the  potential  of  the  stable  distribu- 
tion theory  as  a classifier  will  also  be  given. 

Owing  to  the  operation  difficulties  of  the  RADC  color  printer,  or.'/ 
certain  decision  maps  are  produced  in  a color-print  format.  For  analysis 
purposes,  only  the  numerical  classification  results  based  on  (10  x 10)  cells 
were  utilized  to  compare  against  the  human  Interpretation.  Hence,  the  ab- 
sence of  these  color  decision  maps  did  not  impair  the  hit-rate  analysis. 


Finally,  the  principal  investigator  would  like  to  express  his  gratitude 
to  Captain  Greg  Pavlin  for  his  technical  assistance  performed  (for  this  pro- 
ject) at  RADC. 


SECTION  2 


BACKGROUN'D 

During  1975  and  1976,  RADC  sponsored  a study  titled  "Digital  Image  Pro- 
cessing Techniques  for  Automatic  Terrain  Classification  for  Generating  Ref- 
erence Maps  From  B/W  Aerial  Photography,"  conducted  by  Pattern  Analysis  & 
Recognition  Corporation,  Rome,  New  York.  Since  the  correct  classification 
rate  of  this  study  was  about  80%  using  the  feature  extractor  and  the  classi- 
fier of  the  DICIFER  System,  the  RADC  personnel  felt  that  another  study  is 
needed  to  improve  the  hit-rate  by  developing  a more  powerful  texture  feature 
extractor.  This  led  to  the  current  project  conducted  by  Dr.  Hsu,  using  the 
same  data  sec  for  the  1975-76  study. 

Many  factors  influence  the  hit-rate  of  an  image  data  processing  system, 
the  major  ones  include  the  performance  of  the  feature  extractor  and  the 
classifier.  The  DICIFER  System  has  a limited  capability  of  texture  analysis 
since  only  six  first  order  statistics  are  utilized:  mean,  standard  devia- 

tion, range,  median,  high  and  low'.  riierefore,  to  improve  the  hit-rate,  a 
feature  extractor  using  the  second  order  statistics  has  to  be  developed. 

The  classifier  of  the  DICIFER  System  employs  the  Fisher  Pairwise  logic. 

It  is  similar  to  the  conventional  classifier  based  on  linear  discriminant 
functions.  The  hit-rate  can  be  improved  further  if  one  employs  a classifier 
whose  mathematical  assumptions  fit  the  data  better  than  other  systems. 

Hence  in  this  effort  a new  classifier  named  the  Mahalanobis  Logic  is  devel- 
oped to  accommodate  the  dispersion  characteristics  of  different  training  sets. 
A generalized  inverse  scheme  is  also  developed  to  take  care  of  the  singular- 
ity of  the  dispersion  matrices  for  each  group.  In  fact,  it  has  been  deter- 
mined that  about  30%  of  the  dispersion  matrices  of  the  texture  variables  are 
singular  or  near  singular,  wtiich  cannot  be  inverted  under  normal  conditions. 

The  hit-rate  can  also  be  influenced  by  the  sample  size,  the  location, 
and  the  number  of  the  training  sets.  These  problems  are  of  technical  ones 
and  will  not  be  discussed  in  detail  in  this  report.  The  resenrcners,  how- 
ever, should  be  aware  of  these  problems. 

In  the  late  1960s  and  early  1970s  remote  sensing  researchers  found  that 
the  spectral  data  are  largely  not  normally  distributed.  The  conventional 
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classifiers  based  on  the  normal  assumptions  work  only  In  an  empirical  sense. 
The  RADC  personnel  also  felt  that  It  Is  worthwhile  to  determine  the  degree 
of  the  abnormal  behavior  of  the  texture-tone  data.  The  last  part  of  this 
report  Is  therefore  devoted  to  the  discussion  of  this  problem.  A new  class- 
ifier based  on  stable  distribution  theory  with  the  normal  distribution  as  a 
special  case  will  be  Investigated  In  the  Phase  11  effort  of  this  study. 
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SECTION  3 


TEXTURE  F EAT U RE  EXTRACTION— A REVIEW  AND  A NEW  MEASURE 

3.1  A LITERATL'RE  REVIEW 

3.1.1  Background 

For  years,  the  texture  variable  has  been  recognized  as  one  of  the  im- 
portant criteria  for  identifying  objects  and  scenes  by  the  photo-interpreter 
along  with  other  variables  such  as  tone,  size,  shape,  associated  features, 
etc..  Here  texture  means  the  apparent  minute  pattern  of  detail  of  a given 
area,  described  ordinarily  by  these  terns:  smooth,  fine,  rough,  course,  and 

the  like.  In  digital  data  processing,  texture  means  the  spatial  distribution 
of  Lotius  of  the  pixels  of  a given  area.  Its  attributes  have  to  be  specified 
by  the  investigator — a specific  field  of  study  termed  texture  feature  ex- 
traction . 

Texture  analysis  is  a rather  recent  but  rapidly  growing  field  of  in- 
quiry, though  its  importance  related  to  visual  perception  was  recognized  by 
Gibson  as  early  as  1950.  Over  the  past  twenty  years,  many  texture  measures 
have  been  proposed.  This  body  of  literature  has  been  reviewed  by  Rosenfeld 
in  1975.  In  general,  these  measures  can  be  grouped  into  two  categories. 
Fourier-based  (power  spectrum)  features  and  statistical  features.  Further- 
more, it  has  been  found  that  statistical  features  perform  much  better  than 
th«  other.  (Rosenfeld,  1975). 

To  obtain  texture  features,  the  analyst  must  specify  the  size  of  the 
window  or  control  area,  composed  of  n x n or  n x m pixels,  from  which  texture 
measures  are  to  he  obtained  .ind  analyzed.  Furthermore,  one  can  classify 
either  only  the  center  point  of  the  window  or  the  whole  of  Che  window  into 
one  of  the  specified  groups  or  a reject  category.  For  detailed  mapping 
purposes,  the  former  process  is  required. 

Texture  features  may  include  such  first-order  statistics  as  mean,  stand- 
ard deviation,  range,  median,  extreme  highs  and  lows.  More  significant  are 
the  second-order  statistics,  which  describe  how  various  pairs  of  pixels  oc- 
cur in  specified  spatial  relationships. 
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3.1.2  The  Harallck  Measure 


In  the  early  1960s,  Juless  employed  transition  probabilities  to  charac- 
terize textures  using  scanned  digital  data.  Here  for  any  two  grey  levels  i 
and  j,  the  transition  probability  p(i,J)  measures  how  often  level  i and  level 
occur  in  horizontally  adjacent  position.  This  concept  has  been  followed  by 
many  investigators,  and  expanded  to  Include  other  directions  than  horizontal, 
and  pairs  of  points  that  are  nonadjacent.  This  is  precisely  the  concept  of 
the  spatial  dependence  matrix  introduced  by  Haralick  (1970) . Then  texture 
measures  are  computed  from  a series  of  dependence  matrices  derived  from  eight 
scan  angles  with  elements  representing  relative  frequency  of  tone  levels  of 
neighboring  cells  separated  by  a predefined  distance.  The  basic  texture 
measures  of  Haralick's  method  are  the  angular  second  moment  (ASM),  the 
angular  second  moment  difference  (ASMD) , the  angular  second  moment  inverse 
difference  (ASMID),  and  the  correlation  between  neighboring  grey  tone  (COR). 
Using  the  directional  parameters  (0°,  45°,  90°,  135°),  one  can  obtain  three 
measures,  namely,  average,  range,  and  deviation,  from  each  of  the  basic 
measures  (ASM,  etc).  Originally,  he  proposed  to  employ  36  texture  context 
features  for  classification  purposes.  Since  his  sample  size  is  small,  he 
selected  the  following  12  features. 

For  distance  1;  ASM  (average,  range,  deviation), 

COR  (average,  range,  deviaiton),  and 
ASMID  (average,  range,  deviation) 

For  distance  3:  ASM  (average) 

ASM  (range) 

ASM  (deviation) 

The  results  of  his  experiment  yielded  a 70%  correct  rate  using  a maxi- 
mum likelihood  classification  logic  with  a normality  assumption  for  the  data 
set.  It  should  be  noted  that  the  unit  of  this  analysis  is  a scene  (window), 
not  an  Individual  pixel.  The  same  method  was  also  employed  by  Dyer  et  al.  in 
terrain  classification  with  LANDSAT  data,  yielding  a higher  hit-rate  (about 
90%)  . 

3.1.3  Mitchell's  Max-Mln  Descriptor 
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Recently,  Mitchell  and  Myers  proposed  a new  measure  for  texture  classi- 
fication based  on  the  human  visual  system  intuition  that  the  Important  tex- 
ture information  is  contained  in  the  relative  frequency  of  local  extremes 
of  various  sl’es  in  intensity.  Thus  it  is  called  a max-rain  texture  descrip- 
tor. The  principal  measure  here  is  the  number  of  maxima  and  minima  along  a 
one-dimensional  scan  direction,  under  certain  threshold  conditions.  For  in- 
stance, the  maxima  Is  called  a maximum  only  If  the  Intensity  falls  the  thresh- 
old amount  below  Che  maximum  before  a higher  valued  Intensity  is  encountered. 
Thus,  by  repeating  the  process  for  several  threshold  settings,  analogous  to 
Haralick's  distance  setting,  one  obtains  a vector  of  numbers  characterizing 
the  textures . 

To  make  this  measure  invariant  to  Illumination  and  resolution,  Mitchell 
employed  two  transformational  techniques:  1)  taking  the  log  of  the  inten- 

sities first;  2)  using  the.  ratio  of  the  number  of  the  extrema  at  each  thresh- 
old to  the  next  instead  oi  the  extreimi  them.selves. 

Compared  to  Haralick's  method,  Mitchell's  max-min  texture  analysis  per- 
forms slightly  better,  but  with  much  simpler  computational  effort.  Similar 
to  Haralick's  method,  this  analysi.s  uses  the  whole  of  the  window  as  a 
classification  unit.  it  also  requires  a very  large  window  size  to  obtain 
these  texture  measures.  It,  therefore,  is  not  applicable  to  classify  indi- 
vidual pixels  due  to  pronounced  edge-effect  induced  by  a large  window  size. 

3.2  A NEU'  Mtl\SUREMF.N'T 

To  classify  individual  pixels  rather  than  a group  of  pixels  (windows), 
it  is  proposed  tiiat  < texture  measurement  with  17  and  23  variables  be 

derived  from  ■ s , ‘'‘'del  I)  and  ;5  x b.  Model  II)  windows,  respectively. 

In  the  analyst'-,  t.ie  indow  will  move  from  one  pixel  to  another  with  an 
overlapping  .-egion  ' etween  two  adjacent  pixels;  and  only  the.  center  point  is 
classified . 

In  Model  I,  tne  seventeen  texture  variables  are:  (1)  through  (4)  are 

the  four  central  moments,  (5)  is  the  absolute  deviation  from  tlie  mean,  (6)  is 
the  contrast  of  the  center  point  from  its  neighbors,  (7)  is  the  mean  bright- 
ness of  the  center  point  relative  to  its  tiackground,  (8)  is  the  contrast  be- 
tween adjacent  neighbor.s,  (9)  is  the  sum  of  the  diuared  value  of  (8),  (10)  is 


/ 


J 


the  contrast  between  the  second  neighbors,  (11)  is  the  sum  of  the  squared 
value  of  (10),  and  (12)  through  (17)  are  the  mean  area  above  and  below  three 
datum  planes  (50,  100,  150) . The  code  names  and  computational  formula  of 
these  seventeen  variables  are  given  below: 

^ ; 

TABLE  3-1. THE  TEXTURE-TONE  VARIABLES  OF  MODEL  I i 


Code 

Description  or 

Computational  Formula 

1. 

MEAN 

ave  rage 

2. 

STD 

standard  deviation 

the  four 

3. 

SKEW 

skewness 

central  moments 

4. 

KURT 

kurtosis 

5. 

MDEVN 

lx.-xl/n,  where  x.  = 
1 1 

X = 

tone  value  of  individual 
pixel 

mean 

6. 

MPTCON 

lx,-x  1/n,  where  x.  = 
i c 1 

tone  value  of  the  center 
point 

7. 

MPTREL 

(x^-x.)/n 

8. 

MINCON 

Ix^-Xjl/n,  1 and  j are  adjacent  pixels 

9. 

MINSQR 

(x^-Xj)^Zn 

10. 

M2NC0N 

lx.-k,  1/n 
1 k 

11. 

M2NSQR 

(x^-Xj^)^/n 

12. 

MADATl 

numerical  calculation 
(50) 

of  mean  area  above  datum  1 

13. 

MADAT2 

mean  area  above  datum 

2 (100) 

14. 

MADAT3 

mean  area  above  datum 

3 (150) 

15. 

MBDATl 

mean  area  below  datum 

1 (50) 

16. 

MBDAT2 

mean  area  below  datum 

2 (100) 

17. 

MBDAT3 

mean  area  below  datum 

3 (150) 

) 
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In  Model  II,  with  a (5  x 5)  design,  in  addition  to  the  above  seventeen 
variables,  three  measures  are  extracted  to  characterize  the  oscillation  na- 
ture of  wave-forms  of  the  scan  lines  obtained  along  both  the  x and  y axes  of 
the  data  matrix;  thus,  six  variables  are  available  for  analysis.  They  are 
1)  sum  of  the  contrast  values  from  peak  to  trough;  2)  sum  of  the  distance  of 
peak  positions  from  the  origin;  and  3)  sum  of  the  number  of  peaks  and  troughs. 
This  means  that  there  are  altogether  twenty-three  texture  variables  in  Model 
II. 


TABLE  3-2.  ADDITIONAL  V;\RIABLES  IN  MODEL  II 


Code 


18. 

XCON’T 

19. 

aPFAK 

20. 

XPANDT 

21. 

YCONT 

22. 

YPEAK 

23. 

YP.ANDT 

Description  or  Formula 

(distances  from  peaks  to  troughs)  along  x-axis 
ipeak  positions  from  the  origin)  along  x-axis 
(number  of  peaks  and  troughs)  along  x-axls 
(dlst.inces  from  peaks  to  troughs)  along  y-axls 
(peak  po.sitions  from  the  origin)  along  y-axls 
(number  of  peaks  and  troughs)  along  y-axis 


3.3  COMPARISONS  .AMONG  THE  THREE  MEASUREMENTS 

In  sum,  we  list  the  above-mentioned  three  texture  measurement  in  tenr.s 
of  their  computational  complexity,  required  window  size,  and  classification 
unit  for  comparative  analysts. 
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Harallck 


Mitchell  Hsu 


Computational  Rather 

complexity  Complicated 


Very  simple 


Simple 


Required  window  Depending  on  the 

size  needed  pre-determlned 

distance 


Very  large 


Very  small 
(3  X 3)or 
(5  X 5) 


Classification  Only  group  of  pixels 

unit  being  tried;  but  it 

is  applicable  to 
classify  Individual 
pixels 


Group  of  pixels.  Pixels 
not  applicable  to 
classifying  indi- 
vidual pixels 


Hit-rate 

reported 


ca  70%  (unit  of 
analysis:  scene) 


ca  80%  (unit  of  ca  90% 

analysis:  scene)  (unit  of 

analysis : 
pixel) 


Owing  to  classification  requirements  specified  by  the  AFATL  program, 
the  proposed  new  texture  measurement  is  preferred  to  Harallck' s and 
Mitchell's  methods. 
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SECTION  4 


THE  DEVELOPMENT  OF  THE  MAllAl.ANOBIS  CLASSIFIER 

4 . 1 BACKGROUND 

Over  the  years,  researchers  have  been  using  various  kinds  of  strategies 
to  classify  image  data  into  meaningful  groups.  Among  them,  some  have  mathe- 
matical rigor  and  others  do  not.  In,  general,  tliey  can  be  grouped  into  non- 
parametric  and  parametric  methods.  Examples  of  nonparametric  methods  are 
minimum  distance  to  means,  minimum  distance  to  nearest  member  of  a class, 
etc.  Parametric  methods  used  by  leading  remote  sensing  centers  can  be 
grouped  into  two  broad  categories:  1)  maximum  likelihood  ratio  decision 

rules  based  on  a Bayesian  formation,  a priori  probability  framework;  and 
2)  a class  of  linear  discriminant  functions  based  on  posterior  probabilities 
for  classification.  (Nalepka,  1970,  Swain,  1973).  All  these  classification 
methods  employ  training  sets  (design  sets)  to  define  the  class  characteris- 
tics, therefore,  the>  are  called  "supervised  methods."  If  clustering  methods 
are  used  to  group  the  populations  into  distinctive  classes,  one  obtains  an 
"unsupervised"  method.  Since  the.  unsupervised  methods  are  very  time  consum- 
ing, they  are  not  generally  employed  and  thus  will  not  be  discussed  here. 

4.2  THE  GENERAL  CuiSSlFltATlON  PRINCIPLE 

Within  the  framework  of  parametric  analysis,  one.  employs  a genei  al  dis- 
criminant analysis  to  classify  an  object  into  one  of  k types.  It  is  assumed 
that  the  spectral/textural  signatures  of  the  objects  have  density  functions 
P.j^(Y),...,  where  P.  (Y)  is  the  density  function  for  the  objects  in  the 

Lth  class,  standard  metiiod  as  given  by  Rao  (1973)  is  to  compute  the 

numerical  value  of  P . (1  where  Y is  the  den.sity  of  the  unknown  object,  fot 

each  1 •=  1,...,  k,  and  place  the  object  into  class  i for  which  P.  (Y)  is 

'o 

largest.  In  case  the  (Y)'s  are  niultlvarlato  normal,  this  method  leads  to 

0 

the  usual  linear  discriminant  function.  The  method  can  also  be  modified  to 
incorporate  a priori  distribution  1=1,...,  k,  if  a Bayesian  approach  La 

desired.  Here,  the  quantities  computed  for  i = 1,...,  k and  the 

object  whose  spectral  signature  is  Y is  placed  in  the  class  which  maximizes 


k. 


it^P^(Y)  for  i = 1,..., 

4.3  DIFFERENT  APPROACHES 

The  Bayesian  approach  has  been  reported  by  Fu  (1969)  , and  employed  by 
LARS  of  Purdue  University.  The  linear  discriminant  function  methods  are 
discussed  by  Morrison  (1976).  The  Fisher  Pairwise  logic  of  RADC , and  the 
proposed  Mahalanobis  logic  are  examples  of  the  linear  discriminant  function 
approaches.  Under  the  same  conditions,  these  three  methods  should  perform 
equally  well.  The  difference  in  performance  will  come  from  different  as- 
sumptions that  the  classifier  accepts. 

For  example,  both  the  Bayesian  and  linear  discriminant  function  ap- 
proaches assume  that  1)  the  spectral  data  are  multivariate  normal,  and  2) 
they  have  a common  dispersion  matrix.  Researchers,  however,  have  discovered 
that  spectral  data  are  generally  not  normal,  nor  do  they  have  a common  dis- 
persion pattern.  Once  we  take  into  consideration  these  two  problems  in  the 
design  of  the  classifier,  correct  classification  rates  can  be  improved  sub- 
stantially . 

4.4  THE  DEVELOPMENT  OF  THE  MAHALANOBIS  CLASSIFIER 

Here  we  describe  the  classification  scheme  used  in  this  study.  The 
starting  point  is  the  maximum  likelihood  general  principle  described  in 
section  4.2.  A parametric  form  of  the  probability  density  function  is 
chosen  in  advance.  Usually,  this  form  is  a multivariate  normal  distribution. 
In  this  study  we  first  use  the  normal  distribution  theory  to  develop  a 
Mahalanobis  classifier  and  later  we  introduce  the  stable  classifier  in  sec- 
tion 6. 

After  the  chcice  of  a parametric  model,  a training  set  is  used  to  esti- 
mate the  parameters  in  the  probability  densities  for  each  separate  class, 
e.g.,  soils,  metals,  etc.  Once  the  parameters  are  estimated,  the  particular 
individuals  may  be  classified  according  to  the  maximum  likelihood  principle. 
Each  individual  is  characterized  by  a vector  Y whose  coordinates  consist  of 
the  values  of  the  17  (or  23  depending  on  the  particular  model:  3 x 3 or  5 x 5 
being  used)  variables  listed  in  Table  3-1 . Under  the  normal  distribution  theory 
with  the  non-Bayeslan  approach,  the  probability  density  function  for  the  ith 
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class  is  given  by  the  formula 


(1) 


P^(Y)  = 


(2tt) 


p/; 


l-i 


|i/2 


I,(V. 


where  is  the  vector  of  means  and  is  the  covariance  matrix  for  each 
class.  The  maximum  likelihood  principle  then  dictates  that  an  unknown  ob- 
ject be  classified  into  class  i^  it  (V)  > F . (Y)  for  all  j different  from 

i . Using  logarithms,  this  rule  can  be  restated  as:  classify  into  class  i 

c 0 

if 

(2)  log  ',2^  j + (Y-u. 

"^0  0 "o 

log  + (Y-y^)^  r'^(Y-y^) 

for  every  i different  from  i^.  The  quantity 

(3)  = (Y-y.)"^  2“^Y-y^) 

Is  called  the  Mahalanobls  distance  bef.^een  the  pixel  whose  variable  values 
are  given  in  the  vector  I and  the  class  1 whose  parameters  are  y.  and  2^. 

Since  the  values  of  these  parameters  y^  and  2^  are  unknown  beforehand, 
they  must  be  estimated  from  the  data  obtained  from  the  training  class.  Once 
these  estimates  are  obtained,  they  are  then  used  for  classifying  the  entire 
image. 

During  the  initial  phases  of  the  study,  the  assumption  that  all  co- 
variance  matrices  wer*.  ef.iai  was  made,  but  quickly  discarded  in  favor  of  in- 
dividual covariance  matrices  for  each  class.  Wiien  this  was  done,  the  covari- 
ance matrices  were  found  to  be  singular.  In  this  case,  the  inverse  2^^  of 
the  covariance  matrix  cannot  be  used,  but  in  its  place,  the  generalized  in- 
verse of  2 must  be  used.  In  general,  all  of  the  classification  theory  holds 
if  one  replaces  the  true  Inverse  in  the  formula  by  its  generalized  inverse, 
i^en  the  true  inverse  exists,  the  algorithm  for  producing  the  generalized 
inverse  actually  produces  the  true  inverse. 
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Conceptually,  the  simplest  method  for  obtaining  generalized  inverses  is 
to  use  the  spectral  decomposition  of  the  covariance  matrix 

(4)  I = P^AP 

where  A is  a diagonal  matrix  whose  entries  are  the  eigenvalues  of  I and  P is 
a matrix  whose  columns  are  the  eigenvectors  of  E suitably  normalized  so  that 
the  length  of  the  vector  is  one.  A more  efficient  algorithm  is  available 
for  computing  generalized  inverses.  This  algorithm  is  described  by  Searle 
(1971),  p.  18.  It  consists  principally  of  solving  the  systems  of  linear 
equat ions . 

(5)  iz’x  = 1 

for  X and  then  X will  be  the  required  generalized  inverse.  The  generalized 
inverse  is  not  unique,  but  any  generalized  inverse  used  produces  exactly  the 
same  classifications.  For  this  reason,  we  have  used  the  term  "the  general- 
ized Inverse"  rather  chan  "a  generalized  inverse". 

4.4.1  Classification  Rules 

From  the  discussion  in  the  preceding  section,  it  is  clear  that  the 
classification  rule  states  that  a pixel  should  be  classified  into  the  class 
type  (soil,  metal,  ...)  for  which  the  Mahalanobis  distance  is  smallest  which 
would  also  be  equivalent  to  putting  the  pixel  into  the  class  for  which  the 
posterior  probability  P(G|Y)  Is  largest.  If  classification  of  each  pixel  is 
mandatory,  then  this  rule  is  used.  On  the  other  hand,  if  it  is  permissible 
to  have  some  pixels  unclassified,  then  the  alternate  probability  P(Y]g)  that 
an  object  in  group  G will  have  a Mahalanobis  distance  as  big  as  the  observed 
one  for  this  pixel  is  found.  A cutoff  probability  is  established  (generally 
a small  value)  and  the  pixel  is  declared  unclassified  If  the  probability 
that  this  pixel  belongs  to  group  G is  less  than  the  cutoff  value.  During 
the  study,  various  values  have  been  used,  such  as  .01,  .001,  etc.  As  an  ex- 
ample of  this  rule,  suppose  that  the  mandatory  classification  rule  dictates 
that  a pixel  whose  seventeen  measurements  are  denoted  by  Y is  classified  as 
a metal  because  metal  is  the  closest  class  to  which  this  pixel  can  be  identi- 
fied. However,  the  probability  that  a metal  pixel  v,'lll  have  measurements  as 
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different  fron  the  metal  class  as  this  particular  one  is  small,  say  .001. 

In  this  case,  if  unclassified  pixels  are  pemitted,  then  It  will  be  declared 
unclassified . 

4.4.2  Sep-'ration  of  Classes 

The  most  straightforward  method  for  determining  whether  or  not  the 
selected  classes  (metal,  soil,  etc.)  can  be  separated  is  to  compute  the 
estimate  of  the  confusion  matrix.  In  computing  this  estimate  one  must  pay 
attention  to  Foley's  principle  of  sample  size.  In  particular,  tlie  number  of 
samples  selected  for  each  class  must  be  at  least  three  times  the  number  of 
measurements  associated  with  each  pixel.  (One  can  actually  use  the  rank  of 
the  covariance  matrix  for  the  measurements  in  case  there  are  multiple  col- 
linearities  in  the  data.  This  would  reduce  the  sample  size  somewhat,  al- 
though sample  size  was  not  a problem  in  this  study.) 

The  estimate  of  this  matrix  is  simply  an  array  of  the  percentages  of 
cases  mlsclassif led  into  each  of  the  classes.  If  all  of  the  cases  are  cor- 
rectly classified,  then  separation  is  perfect.  In  this  case,  it  is  probably 
possible  to  reduce  the  number  of  measurements  used  at  each  pixel. 

Stepwise  discriminant  analysis  can  be  utilized  to  select  measurements 
or  features  which  are  most  useful  to  discriminate  between  classes.  This 
stepwise  procedure  consists  of  selecting  the  features — one  at  a time — which 
contribute  most  toward  the  separation  of  groups.  The  selection  procedure  can 
be  stopped  as  soon  as  enough  features  have  been  selected  to  produce  a com- 
plete separation  of  the  groups.  In  the  stepwise  discriminant  procedure,  an 
F test  based  o"  the  likelliu'od  ratio  criteria  is  made  to  select  the  features, 
rather  than  an  analys ■ o the  confusion  matrices. 

4.4.3  Ordination  Fro<  t-dures 

At  each  pixel,  a ^tt  of  features  or  measurements  are  taken  which  des- 
cribe the  texture  and  tone  of  that  particular  pixel.  During  the  initial 
phases  of  the  study,  the  major  objective  was  to  select  the  features  which 
would  contribute  most  to  the  separation  of  the  training  classes.  Initially  a 
large  number  of  texture  features  were  chosen.  Principal  components  and  re- 
lated factor  analysis  methods  were  used  to  determine  the  number  of  non  zero 
eigenvalues  In  the  covariance  matrix  of  the  features.  This  information  des- 
cribes the  number  of  essentially  distinct  features  which  exists  within  the 
data. 
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SECTION  5 


THE  ANALYSIS 

5.1  THE  DATA  SET 

The  data  set  for  this  study  is  composed  of  eight  scenes  of  the  RADC's 
Northeast  Test  Area;  Griffiss  AFB,  New  York  (GALA,  GAHA) ; Verona,  New  York 
POL  Storage  (VPLA,  VPHA) ; Stockbridge,  New  York,  SAM  Site  (SBLA,  SBHA) ; and 
Utica,  New  York, Rail  Yards  (URLA,  URHA)  at  both  low  altitude  (LA),  and  high 
altitude  (HA).  Their  geographic  locations,  elevations  and  flight  height  are 
given  in  Table  5-1. 

TABLE  5-1.  THE  DATA  SET* 


Geographic 

Image 

Scene 

Coordinates 

Elevation 

Flight  Height 

1 GALA 

43°14'N,  75°25'W 

515’ 

15,500' 

GAHA 

61,500' 

2 VPLA 

43°08'N,  75°36’W 

500' 

15,500' 

VPHA 

60,500’ 

3 SBLA 

43°02'N,  75°39'W 

1290' 

16,000’ 

SBHA 

60,500' 

4 URLA 

43°07'N,  75°13'W 

410' 

15,400' 

URHA 

60,500' 

RADC  - TR  - 76  - 196  Final  Report  by  PAR,  pp . 8-9. 

After  digitization  the  ground  resolution  of  the  low  altitude  and  high 
altitude  images  are  appioxiaately  8.75  feet,  and  56.75  feet,  respectively. 

It  should  be  noted  that  the  Images  have  a much  higher  resolution  level. 
Stored  on  tapes,  each  scene  is  then  composed  of  (256  x 256)  pixels,  with 
tonal  densities  ranging  from  0 (black)  to  255  (white). 

For  hit-rate  analysis,  high  resolution  photographs  are  provided  by  RADC 
as  the  basis  of  the  ground  truth  information  to  be  obtained  by  manual  photo- 
interpretation. 
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5.2  THE  HARDWARE  FACILITY  j 

To  carry  out  this  studv  , the  DICXFER  system  at  RADC,  and  the  IBM  370-  | 

158  general  purpose  comput"*-  ac  SCNY-Binghamton  were  utilized.  While  the  j 

RADC  hardware  system  was  used  to  1)  digitize  the  image  data,  2)  store  the  1 

data  in  computer  compatible  tape,  3)  select  the  initial  training  sets,  and 
4)  generate  the  color  decision  maps,  the  IBM  370  system  was  used  mainly  to 
process  the  data  with  the  software  developed  at  Binghamton. 

The  IBM  system  was  also  employed  to  generate  the  tone  maps  and  the 
final  decision  maps  with  the  printer.  This  allows  the  researcher  to  select 
the  appropriate  training  sets  maiiuaily  for  the  classifier.  The  decision  map 
was  translated  in  a numerical  classification  of  each  group  according  to 

(10  X 10)  ceils  which  was  the.n  used  to  check  against  the  manual  interpreta-  j 

tion  result  for  a hit-rate  analysis. 

i 

5.3  THE  SOFTWARE  SYSTEM  , 

The  computer  programs  used  in  processing  the  image  data  include  the  fol-  ^ 

lowing  capabilities:  j 

(1)  Texture-tone  analysis  using  a (n  x n)  window  size  to  generate 
17  to  23  texture  variables  for  each  pixel. 

(2)  The  Mah.alanobis  Logic  for  classifying  pixels  into  one  of  the  | 

design  sets  or  a reject  category.  J 

1 

(3)  A general Ize-inverse  scheme  to  invert  singular  or  near-singu-  ^ 

lar  matrices.  j 

(4)  Generation  of  a numerical  classification  results  according  to  j 

I 

.3  no  X 10'>  ceil  as  the  basis  of  hit-rate  analy.sis. 

(5)  Stepwise  iscriminant  actalysis  to  select  significant  texture 

vai.na-'lfS.  I 


(6) 

(7) 

(8) 
(9) 

(10) 


Pre-proce,:.'.-:  ig  Capability  including  principal  components, 
factor  analvsis,  ' c. 

Generation  of  decision  maps  using  IBM  370  system. 

Generation  confusion  matrix  with  the  training  sot  data. 

2 

Generation  of  D -distances  with  probability  levels  between 
classes . 

Manual  selection  of  training  seta. 
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5.4  GENERATION  OF  A DECISION  MAP 

Decision  maps  are  produced  using  the  following  computational  steps: 

(1)  Generate  the  texture-tone  variables  for  the  training  sets  and 
the  unclassified  set. 

(2)  Compute  the  parameters  for  the  discriminant  function. 

(3)  Classify  the  training  sets,  thereby  obtaining  the  confusion 
matrix. 

(4)  Classify  the  unknown  set. 

The  step  one  is  done  by  the  program  "GENVAR."  This  program  reads  con- 
trol cards  specifying  the  position  and  size  of  each  training  set.  The  vari- 
ables are  computed  as  documented  in  the  GENVAR  program  and  written  onto  a 
disk  data  set. 

The  remaining  steps  are  done  by  program  "SPCMAP."  It  reads  control 

cards  instructing  it  as  to  where  the  training  sets  and  test  set  may  be 

found,  and  how  many  points  are  contained  in  each.  It  also  reads  titles  for 

the  training  sets  and  lists  of  symbols  to  be  used  for  the  map  output.  For 

each  training  set,  the  centroid  vector  and  covariance  matrix  is  computed. 

The  covariance  matrices  are  inverted  using  a generalized  inverse  scheme.  In 

order  to  classify  a point,  the  quadratic  forms  of  the  differences  of  the 

point  vector  and  each  group  centroid  over  the  corresponding  group  Inverse  co- 

2 

variance  matrices  are  computed.  This  quadratic  form  is  the  Mahalanobls  D . 

The  group  which  is  closest  to  the  given  point  is  chosen. 

If  the  user  wishes,  a map  may  be  generated  in  which  points  that  do  not 

strongly  belong  to  any  training  set  are  excluded  or  classified  as  rejects. 

2 

This  is  possible  because  D is  Chi-square  random  variable  having  a probabil- 
ity value.  If  its  probability  is  below  a certain  fixed  cutoff,  the  point  is 
rejected.  Rejects  are  left  blank  on  the  decision  maps. 

5.5  HIT-RATE  ANALYSIS 

To  assess  the  performance  of  the  developed  texture  measures,  hit-rate 
analyses  of  the  test  sites  have  been  carried  out.  The  procedures  Include, 

1)  placing  a (10  x 10)  grid  onto  both  the  computer  decision  map  and  the 
photo  print  of  the  test  area,  and  2)  estimating  and  enumerating  the  percent- 
age of  all  terrain  type  classes  in  each  cell.  The  hit-rate  is  computed  as; 
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Hlr-rate  = 1 


or 


= 1 


Difference  becween  photo-iaterpreLation 

and  computer-decision  map  

photo-interpretation  (in  terms  of 
total  area  of  each  class) 

error-rate 


The  following  table  summarizes  the  results  of  the  analyses.  It  can  be 
concluded  that  for  a larger  area  a hit-rate  of  about  90%  can  be  obtained  with 
properly  digitized  images.  The  hit-rate  for  small  areas  is  statistically 
meaningless.  It  has  been  found  that  digitizing  errors  exist  in  tiie  high 
altitude  Images  (GAILA,  VPHA,  URliA)  , thus  hit-rates  for  these  frames  must  be 


obtained  by  using  sub-groups  within  categories.  For  instance,  in  VPHA  two 


types  of  cultivated 

fields  were 

used  in  the 

training 

sets 

TABLE 

5-2.  HIT- 

RATES 

Vegetation 

Cultivated 
Fl_eld_ 

Metal 

Soil 

Pave- 

ment 

Cotnpo- 
Water  sitlon 

GALA 

88.4% 

98.46% 

90% 

53.13% 

92. :8% 

— 

SBLA 

89.81% 

82.59% 

(in  re- 
jects) 

87.13% 

Too  small  an 
area  for  mean- 
ingful assess- 
ment . 

URLA 

(in  rejects) 

__ 

80.50% 

45.90% 

85.24% 

87.40% 

VPLA 

90.00% 

35.5% 

9 3.0% 

86.00% 

GAHA* 

(Veg.  & Cul. 
field) 

88.51% 

— 

85.53% 

72% 

75.9% 

SBHA 

** 

VPHA 
(5  X 5) 

99% 

60’/ 

95% 

84.1' 

— 

98% 

93.75% 

70.1% 

95% 

85.10% 

URhA 


Uneven  densities  for  cultivated  field,  NE  corner  vs.  SW  comer.  Thus 
vegetation  and  cultivated  fleius  are  treated  as  one  group. 

Uneven  tones  for  cultivated  field  due  to  "digitizing  error"  which 
induced  confusion  between  vegetation  and  cultivated  iields.  (Top  one-tlilrd 
vs.  lower  two-thirds.) 

No  meaningful  hit-rates  can  be  obtained  due  to  "digitizing  error." 
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5.5.1 


GALA 


The  analysis  shows  that  a hit-rate  over  90%  (except  for  soil)  has  been 
achieved  by  Model  I.  It  should  be  noted  that  the  photo-interpretation  of 
the  ground  truth  is  obtained  from  a high  resolution  aerial  photo  rather  than 
low  resolution  images  from  which  tiie  computer  decision  map  was  derived.  The 
author  has  investigated  further  the  problem  regarding  the  soil  class  using 
the  output  from  Model  II.  It  was  first  thought  to  be  the  "edge  effect." 
However,  since  the  mis-classif ication  of  the  soil  pixels  was  largely  elimi- 
nated in  Model  II,  it  was  ttierefore  determined  to  be  "resolution  effect," 
which  was  purposely  induced  into  the  images  during  the  process  of  digitiza- 
tion. The  performance  of  Mode!  11  is  better  than  Model  I,  except  it  has  a 
larger  area  of  reject,  and  occasionally  pronounced  edge  effeot. 

5.5.2  SBU\ 

In  general,  the  overall  terrain  pattern  came  out  very  well  in  the  deci- 
sion map.  The  SAM  site  and  tanks  (metal-objects)  were  correctly  identified 
using  the  reject  category. 

Similar  to  GALA,  "resolution"  el  feet  occured  at  the  "edge’’  of  two  dis- 
tinctive classes,  and  at  certain  vegetation  areas. 

The  rejects  region  were  about  10%  of  the  total  area.  There  was  no 
significant  difference  between  the  "reject"  pattern  determined  by  P(X/G)  = 
0.01  and  that  by  P(X/G)  = 0.001.  This  means  that  the  pixels  being  rejected 
were  really  different  from  the  design  sets. 

5.5.1  VPL,\ 

rhe  overall  terrain  pattern  in  the  decision  map  was  good  in  the  sense 
that  essential  types  were  correctly  identified.  In  terms  of  a detailed  hit- 
rate  analysis,  the  corrt.;^t  classification  rate  is  about  85%  (excepting  pave- 
ment). Two  factor?;  '-aused  the  error  rate;  1)  asphalt-paved  road  could  not 
be  differentiated  from  fields  used  for  recreational  purposes;  and  2)  a new 
concrete  road  was  being  built  at  the  time  the  image  was  taken — many  types  of 
"pavement"  were  present  at  this  section  of  the  image.  If  cultivated  field 
and  pavement  were  treated  .is  one  group,  the  hit-rate  will  be  over  95%. 

To  achieve  a correct  classification  of  this  frame,  four  types  of  cul- 
tivated field  were  used  in  the  training  sets  to  cover  significant  local 
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variations.  In  terms  of  tlie  training  set  itself,  a hit-rate  of  98.4  was 
achieved.  However,  in  terms  of  the  test  set,  the  hit-rate  is  much  lower  due 
to  significant  local  variations. 
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5.5.4  L’R)J\ 

The  L'KLA  was  a more  complicated  frame,  thus  an  iterative  process  was 
utilized  to  generate  the  decision  maps.  The  more  obvious  classes,  such  as 
metal,  pavement,  composition,  etc.,  were  processed  first  and  the  "uncertain" 
and  insignificant  (in  terms  of  aereal  coverage)  vegetation,  were  left  out. 
The  "reject"  area  thus  represents  mixed  water,  vegetation  and  cultivated 
fields,  etc.  At  both  0.01  and  0.001  probability  reject  levels,  the  area 
showing  "rejects"  is  very  small,  corresponding  to  a potential  area  of  mixed 
water  and  vegetation. 

5.5.5  SBiLA 

This  was  the  only  frame  in  the  high  altitude  image  group  that  had  few 

digitization  problems.  Tlie  generation  of  the  decision  map  was  therefore 
rather  straightforward  due  to  less  complexity  in  the  terrain  configuration, a 
very  high  hit-rate  was  achieved  (over  95%). 

5.5.6  VTib\ 

Image  digitization  error  existed  in  the  frame;  specifically  the  upper 
one-third  is  much  lighter  than  the  lowtn  two-thirds  portion  of  the  frame. 
Using  the  KAhb  i)It'.[FbK  system,  it  was  di  iiun.iined  tliat  a 30-point  difference 
existed  between  tiiese  two  portions  of  the  1 rame  for  cultivated  field  catc- 
gnry . 

5.5.7  G.\HA  and 

Thi.  s.tmc  d i g i t i .■<;  t i lui  problL’m  can;  d tho  fact  that  the  NE  corner  of 
GAilA  is  much  lighter  tli  in  the  sanu  terr.ii  i types  in  the  SW  corner.  i'o  pro- 
cess this  frame,  two  artificial  types  ('•  in  1 1 i voted  fields  had  to  be  used  in 
the  design  sets.  Since  vt'getation  and  cultivated  field  classes  were  really 
confuseil  by  this  digitizafion  efieci,  they  were  grouped  as  one  class  in  the 
hit-rate  analysis. 

We  were  unable  to  obtain  a reli  hlo  hit-rate  for  URHA  due  to  the  s.'ime 
digitization  problem.  However,  we  were  able  to  produce  a fairly  good  deci- 
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sion  map  in  terms  of  the  overali  terrain  pattern. 


5.6  GI-.NKK^M,  COMMENT  ON  THE  DECISION  MAP  M\K1NC 

In  addition  to  the  feature  extractor  and  the  classifier,  the  hit-rate 
and  false  alarm  rate  also  depend  on  the  factors  regarding  sample  size,  the 
location,  and  the  number  of  the  training  sets. 

The  minimum  sample  size  problem  has  been  investigated  by  Forley.  His 
principle  states  that  for  a valid  analysis  the  minimum  sample  size  is  three 
times  as  large  as  the  number  of  the  vari^ oles  used.  For  instance,  if  one 
employs  ten  texture  variables  in  the  analysis,  the  minimum  number  of  each 
trair’ng  set  is  30.  It  is  also  our  experience  that  the  Forley  principle  is 
valid  and  that  empirically,  the  sample  size  of  each  training  set  should  be 

greater  than  30  pixels  in  general.  t 

Improper  training  sets  generally  lead  to  a low  hit-rate.  To  avoid  such 
an  error,  one  should  first  employ  the  confusion  matrix  (from  the  training 
sets),  to  identify  confused  classes,  and  to  locate  mis-classif led  pixels  on 
the  (preliminary)  decision  map.  Then,  one  should  change  the  location  of  the 
training  sets  in  order  that  "pure"  training  sets  can  be  obtained.  This  is 
an  iterative  process,  and  it  can  be  done  manually  or  by  the  operator  using 
the  interactive  graphics,  namely,  using  a cursor  on  the  color  monitor  with 
a terminal  control.  Once  the  correct  classification  rate  in  the  design  set 
reaches  a level  of  90%  or  over,  the  investigator  can  proceed  to  classify  the 
test  set  data. 

To  classify  the  test  set,  one  can  classify  each  group  at  a time,  or 
classify  many  groups  in  one  process.  Theoretically,  the  first  method  will 
yield  a lower  hit-rate  bci  uise  there  is  only  one  probability  value  for  each 
pixel  to  be  used  in  the  cl  ass  i t icat  ioii , which  may  not  be  maximum  once  other 
groups  are  introduced.  Most  likelv,  this  metiiod  will  produce  overlapping 
groups,  that  is,  an  individual  pixel  may  belong  to  several  groups. 

To  assure  that  the  test  sets  are  properly  classitied,  .ill  the  desired 
groups  should  be  introduced  in  the  design  set.  Furthermore,  if  local  dif- 
ferences exist  within  one  groufi,  sub-groups  siiould  be  introduced.  These  sub- 
groups can  be  labeled  as  one  group  only  after  tlie  decision  map  is  produced. 

It  is  our  experience  that  a sufficient  number  of  groups  should  be  used  in  the 
design  sets;  otherwise,  mis-class 1 f icat ions  or  rejects  will  be  substantial. 
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VPLA— ORIGINAL  DIGITIZED  PHOTO 


Vl’lU— ORIGINAL  DIGITIZED  PHOTO 
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SBU\— ORIGINAL  DIGITIZED  PHOTO 


SRHA— ORIGINAL  DIGITIZED  PHOTO 
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DECISION  MAP:  GALA  (3  X 3)  WITH  NO  REJECTS 


DECISION  MAP:  GALA  (3X3)  WITH  REJECTS  (DEEP  BLUE) 
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URIA  (3  X 3)  WITH  NO  REJECTS 


SECTION  6 


ESTIMATION  OF  STABLE  PARAMETERS 
6.1  CHARACTERISTICS  OF  STABLE  DISTRIBUTIONS 

As  part  of  this  study,  stable  distributions  were  considered  as  alterna- 
tives to  the  multivariate  normal  distributions  on  which  the  Mahalanobls 
classifier  is  based. 

Stable  distributions  are  best  defined  in  terms  of  their  characteristic 
functions  4i(t)  or  its  logarithm  which  in  the  univariate  case  (single  feature) 
is  given  by: 


(1) 


00  . 

log  i|i(t)  = log  / e ^ dr(x) 
e e 

= it6  - Y|tl'*  l1  + i 6 w(t,a)]] 

where 

u(t,a)  = Tan(Tta/2),  if  a 4 i 
= ^ log  1 t 1 , if  a = 1 


and  6 is  a location  parameter,  y ^ o is  a scale  parameter,  a is  the  charac- 
teristic and  S is  the  symmetry  parameter.  The  parameter  6 plays  the  role  of 
the  mean  and  is  equal  to  the  mean  whenever  the  mean  exists.  The  variance  is 
always  infinite  when  a < 2,  however  the  parameter  y plays  the  role  of  a 
scale  parameter  and  B Is  the  symmetry  parameter.  In  particular,  if  B = o, 
the  distribution  Is  symmetric  about  5.  The  parameter  a is  called  the  charac- 
teristic of  the  distribution  and  o < a < 2,  If  a = 2,  then 

2 

log  i))(t)  = lt6  - yt 


which  is  the  characteristic  function  of  a univariate  normal  distribution  with 
mean  6 and  variance  y/2.  If  a = 1,  the  distribution  is  Cauchy.  For  all 
other  values  of  a,  the  density  exists  but  a closed  formula  for  it  is  not 
known.  Various  power  series  expansions  for  the  density  exist  which  may  be 
found  for  example  in  DuMouchel  (1971)  and  Feller  (1966) . 

The  most  Important  parameter  for  a stable  distribution  is  a because  it 
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determines  the  type  of  the  distribution.  When  a = 2,  the  distribution  is 
normal.  In  this  study,  the  estimation  of  a is  used  to  assess  the  normality 
or  lack  of  normality  of  the  data.  If  the  estimate  for  o is  close  to  2,  then 
the  data  may  be  assumed  to  be  normal.  The  results  of  this  study  indicate 
that  many  of  the  features  have  distributions  which  are  not  normal. 

The  main  reason  for  considering  stable  distributions  are  given  here. 

The  first  advantage  of  stable  distributions  lies  principally  in  the  general- 
ized central  limit  theorem.  Among  other  things,  it  states  that  if  X^, 
...,  X^  are  independent  identically  distributed  random  variables  having  any 
distribution  with  finite  variance,  the  distribution  of  their  sum  will  tend 
toward  the  normal  distribution  as  n increases.  This  is  an  exceedingly  power- 
ful result  as  it  shows  that  if  an  observable  random  variable  is  produced  as 
the  sum  of  many  independent  nearly  identically  distributed  random  variables, 
its  distribution  will  be  approximately  normal,  no  matter  what  the  distribu- 
tion of  the  underlying  variables.  If  the  variance  is  not  finite,  a limiting 
distribution  for  the  sum  may  still  exist.  The  vital  point  is  that  if  it  does 
exist,  it  is  a stable  distribution.  Every  member  of  the  stable  family  is 
such  a limit,  and  no  distribution  other  than  a stable  distribution  may  be 
such  a limit.  This  unique  property  gives  stable  distributions  an  important 
position  in  statistical  theory  and  practice. 

One  more  property  of  stable  distributions  which  does  not  have  the 
theoretical  Impact  of  the  generalized  central  limit  theorem,  but  nevertheless 
makes  them  valuable  as  models  of  empirical  results,  is  the  following:  Ex- 

perimental image  data  by  no  means  need  be  normal.  Mixed  in  with  a bulk  of 
roughly  normal  observations  may  be  one  or  two  outliers.  A whole  body  of 
literature  has  accumulated  on  what  to  do  with  them.  The  question  usually 
asked  is  whether  to  keep  them  as  valid  measurements  which  will  admittedly 
grossly  affect  the  results,  or  discard  them  as  noise.  The  principal  problem 
is  that  most  widely  available  statistical  tests  are  incapable  of  properly 
handling  empirical  distributions  in  which  the  sum  of  a set  of  random  vari- 
ables is  largely  dominated  by  one  of  the  observations.  For  this  reason,  the 
outliers  are  usually  discarded.  The  method  of  choice  would  seem  to  be  to 
keep  all  the  data,  but  use  a method  of  analysis  which  is  capable  of  dealing 
fairly  with  such  distributions.  Recent  experimentation  in  the  field  of 
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economics  by  Mandelbrot  (1963)  has  indicated  that  stable  methods  are  well 
suited  to  tills  t^<sk.  Feller  (1966)  also  presents  a small  but  interesting 
survey  of  physical  processes  governed  by  stable  laws. 

One  of  the  important  properties  of  multivariate  normal  distributions  is 
that  every  linear  combination  of  its  components  has  a univariate  normal  dis- 
tribution. This  property  also  carrl-’-^  over  to  stable  distributions.  That 
is,  every  linear  combination  of  the  components  of  a multivariate  stable  dis- 
tribution has  a univariate  stable  distribution. 

Another  important  property  of  stable  distributions  is  that  the  sum  of 
two  independent  stable  variates  with  the  same  characteristic  r is  itself 
stable  with  the  same  characteristic  a as  the  summands. 

A final  advantage  to  modelling  by  means  of  stable  distributions  is  that 
skewed  distributions  can  be  accommodated. 

There  is  a theory  of  multivariate  stable  distributions  which  is  similar 
to  the  theory  of  multivariate  normal  distributions  and  in  fact  contains  the 
normal  theory  as  a special  case.  As  in  the  univariate  case,  the  multivari- 
ate stable  distribution  is  best  described  by  its  characteristic  function. 
Details  of  the  multivariate  stable  distributions  are  found  in  Press  (1972). 

6.2  ESTIMATION  METHODS 

Various  methods  for  estimation  of  stable  parameters  b ave  been  proposed 
in  the  statistical  literature.  During  this  study,  these  methods  have  been 
evaluated  for  their  practical  value.  Some  methods  have  been  found  to  be 
reasonably  useful  while  others  are  useless. 

For  syiKmetrlc  distributions,  a relatively  easy  method  for  computing  es- 
timates of  the  parameters  u,  and  y are  given  by  Fama  and  Roll  (1968, 
1971).  They  have  show.;  that  truncated  means  are  good  (and,  obviously,  un- 
biased) estimators  of  the  location  5.  The  degree  of  truncation  which  pro- 
vides minimum  error  variance  is  a function  of  a,  but  using  the  central  fifty 

percent  gives  quite  good  results.  They  also  show  that  c = .605(X  ^.-X  -q) , 

tin  • t ^ *2,0 

where  X^,  Is  the  estimate  of  the  P fractlle  of  the  sample,  is  a reasonable 

estimator  of  the  scale.  It  has  small  asymptotic  bias.  Its  error  variance, 

though  small  enough  for  non-crltlcal  work,  is  significantly  larger  than  the 

Rao-Cramer  lower  bound.  They  also  investigate  the  approximation  of  a by 
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choosing  a such  that,  for  some  previously  specified  fractlle  P,  the  theoreti- 
cal fractlle  of  s«  ^ Is  the  same  as  the  sample  fractlle.  Again,  the  optimal 
value  for  P varies  with  the  true  alpha  and  the  sample  size.  These  estimates 
of  alpha  showed  some  small  bias  and  an  error  variance  which  Is  considerably 
larger  than  could  be  had.  However,  all  three  estimators  are  excellent  when 
one  considers  their  elegant  simplicity. 

DuMouchel  (1971)  did  some  work  on  maximum  likelihood  estimation  of 
stable  parameters.  The  data  Is  censored  Into  discrete  classes.  The  likeli- 
hood function  Is  evaluated  for  trial  parameters  using  asymptotic  formulas 
for  extreme  values  and  the  fast  Fourier  transform  of  the  characteristic 
function  for  central  values.  This  method  has  two  drawbacks.  First,  the 
censoring  of  the  data  causes  some  Information  loss.  Second,  and  most  seri- 
ous, is  Its  incredible  slowness.  Each  Iterative  try  for  a better  parameter 
estimate  requires  that  the  characteristic  function  be  evaluated  at  many 
points,  and  several  Fourier  transforms  be  done. 

Other  methods  have  been  proposed  based  on  characteristic  functions. 

These  methods  in  general  have  been  shown  unworkable.  The  method  of  maximum 
likelihood  is  the  best  method  for  obtaining  estimates  in  the  case  of  stable 
distributions. 

At  this  time  there  are  no  known  methods  for  explicitly  evaluating  stable 
densities  and  distribution  functions  to  any  realistic  precision  rapidly 
enough  to  be  feasible  for  production  work.  Therefore,  it  has  been  the  goal 
of  this  study  to  investigate  algorithms  which  are  quite  slow  but  accurate. 
These  then  are  used  to  ge.ierate  tables  suitable  for  interpolation.  Means  of 
storing  and  accessing  these  tables  for  the  evaluation  of  stable  densities  and 
distributions  at  high  speed  have  also  been  developed. 

All  of  the  rather  complex  details  can  be  found  in  Masters  (1977) . A 
large  number  of  Monte  Carlo  experiments  were  run  in  order  to  test  the  al- 
gorithm for  computing  these  parameter  estimates.  All  of  the  estlmaters 
showed  very  small  bias  and  standard  error  for  most  values  of  a.  Some  prob- 
lems were  noted  when  a Is  close  to  1 or  2.  Fortunately,  values  of  o close  to 
i do  not  seem  to  occur  For  a “ 2,  one  can  use  estimators  for  normal  distri- 
butions . 
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6.3 


PRELIMINARY  RESULTS-NON  NORMAL  BEHAVIOR  OF  THE  TEXTURE  VARIABLES 


In  order  to  assess  the  normality  assumption  for  the  texture  variables, 
an  estimate  of  the  stable  distribution  parameters  was  made  for  the  data 
available  fiom  the  scenes  VPLA,  VPHA,  SBLA  and  GAHA.  Some  results  of  these 
estimates  are  presented  in  Table  6-1.  The  complete  set  of  seventeen  texture 
variables  is  given  for  four  separate  classes:  pavement,  vegetation,  culti- 

vated fields  (1)  and  soil.  The  estimated  value  of  a is  given  in  the  table. 
The  largest  value  of  a given  by  the  estimation  algorithm  is  1.99,  and  the 
smallest  is  1.01.  Thus  a value  of  1.99  means  that  the  estimate  is  between 
1.99  and  2.00.  A value  of  1.99  Indicates  that  the  data  are  normally  distri- 
buted whereas  smaller  values  indicate  that  stable  models  are  more  appropri- 
ate. The  complete  set  of  values  run  is  too  large  to  present.  A total  of  476 
different  variables  have  been  estimated  over  the  four  photographs.  Of  these, 
263  had  a values  different  from  1.99.  This  means  that  55%  of  the  texture 
variables  are  not  normally  distributed. 

A complete  set  of  estimates  for  '.he  B parameter  is  not  available,  but 
preliminary  estimates  indicate  that  B does  deviate  from  0 (symmetric  dis- 
tributions) . 

These  results  indicate  that  development  of  a stable  classifier  will  pro- 
duce good  results. 
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I TABLE  6-1.  VALUES  OF  a FOR  VPHA 


Texture 


Variable 

Pavement 

Vegetation 

C.  Field  1 

Soil 

1.  MEAN 

1.20 

1.99 

1.99 

1.15 

2.  STD 

1.59 

1.20 

1.71 

1.99 

3.  SKEW 

1\31 

1.99 

1.99 

1.99 

4.  KURT 

1.01 

1.53 

1.58 

1.45 

5.  MDEVN 

1.99 

1.21 

1.69 

1.99 

6.  MPTCON 

1.18 

1.35 

1.69 

1.01 

7.  MPTREL 

1.99 

1.99 

1.84 

1.08 

8.  MINCON 

1.24 

1.22 

1.55 

1.41 

9.  MINSQR 

1.27 

1.21 

1.64 

1.99 

10.  M2CON 

1.99 

1.26 

1.63 

1.50 

11.  M2NSQR 

1.99 

1.24 

1.78 

1.66 

12.  MADATl 

1.18 

1.99 

1.99 

1.15 

13.  MADAT2 

1.15 

1.01 

1.01 

1.18 

14.  MADAT3 

1.01 

1.01 

1.01 

1.99 

15.  MBDATl 

1.01 

1.99 

1.01 

1.01 

16.  MBDAT2 

1.85 

1.99 

1.99 

1.01 

17.  MBDAT3 

1.34 

1.99 

1.99 

1.01 
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